GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis
نویسندگان
چکیده
GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The program includes a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database; two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies, which is freely available at http://helix2.biotech.ufl.edu:26878/metagenomics/.
منابع مشابه
Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually ...
متن کاملEvidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data
The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clust...
متن کاملMyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences
Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it emp...
متن کاملIMP : a pipeline for reproducible integrated 1 metagenomic and metatranscriptomic analyses
20 We present IMP, an automated pipeline for reproducible integrated analyses of coupled 21 metagenomic and metatranscriptomic data. IMP incorporates preprocessing, iterative co22 assembly of metagenomic and metatranscriptomic data, analyses of microbial community 23 structure and function as well as genomic signature-based visualizations. Complementary use 24 of metagenomic and metatranscripto...
متن کاملAlignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction
The visualization of metagenomic data, especially without prior taxonomic identification of reconstructed genomic fragments, is a challenging problem in computational biology. An ideal visualization method should, among others, enable clear distinction of congruent groups of sequences of closely related taxa, be applicable to fragments of lengths typically achievable following assembly, and all...
متن کامل